Book scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical and into digital media such as digital image, electronic text, or e-book (e-books) by using an image scanner. Large scale book scanning projects have made many books available online. Digital books can be easily distributed, reproduced, and screen reading. Image scanners may be manual or automated. After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form. Scanning resolution for book digitization varies depending on the purpose and nature of the material. High-end scanners capable of thousands of pages per hour can cost thousands of dollars. Projects like Project Gutenberg, Million Book Project, Google Books, and the Open Content Alliance scan books on a large scale. Image scanners may be manual or automated.
Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on a flat glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of the scanner, making it easier to line up the book's spine.
These higher resolutions ensure the capture of fine details and support long-term preservation efforts, while a tiered approach balances quality with practical constraints such as storage capacity and resource limitations. This strategy allows institutions to optimize digitization efforts, applying higher resolutions selectively to rare or significant materials while using standard resolutions for more common documents.
The advantage of this type of scanner is that it is very fast, compared to the productivity of overhead scanners.
One of the main challenges to this is the sheer volume of books that must be scanned. In 2010 the total number of works appearing as books in human history was estimated to be around 130 million. All of these must be scanned and then made searchable online for the public to use as a universal library. Currently, there are three main ways that large organizations are relying on: outsourcing, scanning in-house using commercial book scanners, and scanning in-house using robotic scanning solutions.
As for outsourcing, books are often shipped to be scanned by low-cost sources to India or China. Alternatively, due to convenience, safety and technology improvement, many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or digital camera-based scanning machines which are substantially faster and is a method employed by Internet Archive as well as Google. Traditional methods have included cutting off the book's spine and scanning the pages in a image scanner with automatic page-feeding capability, with subsequent rebinding of the loose pages.
Once the page is scanned, the data is either entered manually or via OCR, another major cost of the book scanning projects.
Due to copyright issues, most scanned books are those that are out of copyright; however, Google Books is known to scan books still protected under copyright unless the publisher specifically prohibits this.
These projects establish and publish best practices for digitization and work with regional partners to digitize cultural heritage materials. Additional criteria for best practices have more recently been established in the UK, Australia and the European Union. Wisconsin Heritage Online is a collaborative digitization project modeled after the Colorado Collaborative Digitization Project. Wisconsin uses a wiki to build and distribute collaborative documentation. Georgia's collaborative digitization program, the Digital Library of Georgia, presents a seamless virtual library on the state's history and life, including more than a hundred digital collections from 60 institutions and 100 agencies of government. The Digital Library of Georgia is a GALILEO initiative based at the University of Georgia Libraries.
In the twentieth century, the Hill Museum and Manuscript Library photographed books in Ethiopia that were subsequently destroyed amidst political violence in 1975. The library has since worked to photograph manuscripts in Middle Eastern countries.
In South Asia, the Nanakshahi trust is digitizing manuscripts of Gurmukhī script.
In Australia, there have been many collaborative projects between the National Library of Australia and universities to improve the repository infrastructure that digitized information would be stored in.Libraries in the twenty-first century: Charting new directions in information services. Edited by Stuart Ferguson, 2007, pg 84 Some of these projects include, the ARROW (Australian Research Repositories Online to the World) project and the APSR (Australian Partnership for Sustainable Repository) project.
Most high-end commercial robotic scanners use air and suction technology to turn and separate pages. These scanners utilize a vacuum or air suction to gently lift a page from the stack, while a puff of air is used to turn the page over, allowing the device to scan both sides efficiently. Some use newer approaches such as bionic fingers for turning pages. Some scanners take advantage of ultrasonic or photoelectric sensors to detect dual pages and prevent skipping of pages. With reports of machines being able to scan up to 2,900 pages per hour, robotic book scanners are specifically designed for large-scale digitization projects.
Google's patent 7508978 shows an infrared camera technology which allows detection and automatic adjustment of the three-dimensional shape of the page. The Secret Of Google's Book Scanning Machine Revealed, by Maureen Clements, April 30, 2009. Robotic book scanners that use air and suction technology rely on specialized systems to turn and separate pages without causing damage to fragile or rare books. These scanners utilize a vacuum or air suction to gently lift a page from the stack, while a puff of air is used to turn the page over, allowing the device to scan both sides efficiently
|
|